Bapat–Beg theorem

In probability theory, the Bapat–Beg theorem^[1] gives the joint cumulative distribution function of order statistics of independent but not necessarily identically distributed random variables in terms of the cumulative distribution functions of the random variables. A simple proof of this can be found in ^[2]

Ordinarily, all elements of the sample are obtained from the same population and thus have the same probability distribution. The Bapat–Beg theorem describes the order statistics when each element of the sample is obtained from a possibly different population with a different probability distribution.

1 The theorem
2 The independent identically distributed case
3 Remarks
4 Complexity
5 See also
6 Notes

The theorem

Let $\textstyle X_{i}$ , $\textstyle i=1,\ldots,m$ be independent real valued random variables with cumulative distribution functions $\textstyle F_{i}\left( x\right)$ . Denote the order statistics by $\textstyle Y_{1},Y_{2},\ldots,Y_{m}$ , with $\textstyle Y_{1}\leq Y_{2}\leq\cdots\leq Y_{m}$ . Further let $\textstyle y_{0}=-\infty,$ $\textstyle y_{k%2B1}=%2B\infty$ , and

$i_{0}=0,\quad i_{k%2B1}=m,\quad F_{i}\left( y_{0}\right) =0,\quad F_{i}\left( y_{k%2B1}\right) =1$

for all $\textstyle i.$ For $\textstyle 1\leq n_{1}<n_{2}<\cdots<n_{_{k}}\leq m$ and $\textstyle y_{1}\leq y_{2}\leq\cdots\leq y_{k}$ , the joint cumulative distribution function of the subset $\textstyle Y_{n_{1}},\ldots Y_{n_{k}}$ of the order statistics satisfies

$\begin{align} & F_{Y_{n_{1}},\ldots Y_{n_{k}}}\left( y_{1},\ldots,y_{k}\right) \\ & =\Pr\left\{ Y_{n_{1}}\leq y_{1}\wedge Y_{n_{2}}\leq y_{2}\wedge\cdots\wedge Y_{n_{k}}\leq y_{k}\right\} \\ & =\sum_{i_{k}=n_{k}}^{m}\cdots\sum_{i_{2}=n_{2}}^{i_{3}}\,\sum _{i_{1}=n_{1}}^{i_{2}}\frac{P_{i_{1},\ldots,i_{k}}\left( y_{1},\ldots ,y_{k}\right) }{\left( i_{1}-i_{0}\right) �!\left( i_{2}-i_{1}\right)�!\cdots\left( i_{k%2B1}-i_{k}\right) �!}, \end{align}$

where

$\begin{align} & P_{i_{1},\ldots,i_{k}}\left( y_{1},\ldots,y_{k}\right) \\ & \quad=\text{per}\begin{bmatrix} \left[ F_{i}(y_{j})-F_{i}(y_{j-1})\right] _{\left( i_{j}-i_{j-1}\right) \times1}\end{bmatrix} _{j=1,i=1}^{j=k%2B1,i=m}\end{align}$

is the permanent of a block matrix with the subscript $\textstyle \left( i_{j}-i_{j-1}\right) \times1$ denoting the dimension of a block obtained by repeating the same entry, and the block row index $\textstyle j$ and block column index $\textstyle i$ .

The independent identically distributed case

In the case when the variables $\textstyle X_{1}$ , $\textstyle X_{2},\ldots,X_{m}$ are independent and identically distributed with cumulative probability distribution function $\textstyle F_{i}=F$ for all $\textstyle i$ , the Bapat–Beg theorem reduces to ^[3]

$\begin{align} & F_{Y_{n_{1}},\ldots Y_{n_{k}}}\left( y_{1},\ldots,y_{k}\right) \\ & =\Pr\left\{ Y_{n_{1}}\leq y_{1}\wedge Y_{n_{2}}\leq y_{2}\wedge \cdots\wedge Y_{n_{k}}\leq y_{k}\right\} \\ & =\sum_{i_{k}=n_{k}}^{m}\cdots\sum_{i_{2}=n_{2}}^{i_{3}}\,\sum_{i_{1}=n_{1}}^{i_{2}}m!\prod\limits_{j=1}^{k%2B1}\frac{\left[ F\left( y_{j}\right) -F\left( y_{j-1}\right) \right] ^{i_{j}-i_{j-1}}}{\left( i_{j}-i_{j-1}\right) �!}, \end{align}$

where again

$i_{0}=0,\quad i_{k%2B1}=m,\quad F\left( y_{0}\right) =0,\quad F\left( y_{k%2B1}\right) =1.$

Remarks

No assumption of continuity of the cumulative distribution functions is needed.

The theorem is formulated for the joint cumulative probability distribution function in terms of a subset of the order statistics and ordered arguments. If the inequalities $\textstyle y_{1}\leq y_{2}\leq\cdots\leq y_{k}$ are not imposed, some of the inequalities $\textstyle Y_{n_{j}}\leq y_{j}$ may be redundant (because always $\textstyle Y_{1}\leq Y_{2}\leq\cdots\leq Y_{m})$ and the argument list needs to be first reduced by dropping all $\textstyle y_{j}$ such that $\textstyle y_{i}>y_{j}$ for some $\textstyle i$ .

Complexity

The Bapat–Beg formula involves exponentially many permanents, and the complexity of the computation of the permanent itself is at least exponential (unless P = NP). Thus, in the general case, the formula is not practical. However, when the random variables have only two possible distributions, the complexity can be reduced to $O(m^{2k})$ .^[3] Thus, in the case of two populations, the complexity is polynomial in m for any fixed number of statistics k.

Notes

^ R. B. Bapat and M. I. Beg. Order statistics for nonidentically distributed variables and permanents. Sankhyā Ser. A, 51(1):79–93, 1989. MR 1065561
^ Sayaji Hande. A note on order statistics for nonidentically distributed variables Sankhyā Ser. A, 56(2):365–368, 1994. MR 1664921
^ ^a ^b Glueck; Anis Karimpour-Fard; Jan Mandel; Larry Hunter; Muller (2008). "Fast computation by block permanents of cumulative distribution functions of order statistics from several populations". Communications in Statistics - Theory and Methods 37 (18): 2815–2824. arXiv:0705.3851. doi:10.1080/03610920802001896. PMC 2768298. PMID 19865590. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2768298.
^ N. Balakrishnan and C. R. Rao. Order statistics: an introduction. In Order statistics: theory & methods, volume 16 of Handbook of Statist., pages 3–24. North-Holland, Amsterdam, 1998.